Parallel Corpus Clean-up Based on Recursive Learning
نویسندگان
چکیده
منابع مشابه
Corpus-Induced Corpus Clean-up
We explore the feasibility of using only unsupervised means to identify non-words, i.e. typos, in a frequency list derived from a large corpus of Dutch and to distinguish between these non-words and real-words in the language. We call the system we built and evaluate in this paper CICCL, which stands for ‘Corpus-Induced Corpus Clean-up’. The algorithm on which CICCL is primarily based is the an...
متن کاملTICCLops: Text-Induced Corpus Clean-up as online processing system
We present the ‘online processing system’ version of Text-Induced Corpus Clean-up, a web service and application open for use to researchers. The system has over the past years been developed to provide mainly OCR error post-correction, but can just as fruitfully be employed to automatically correct texts for spelling errors, or to transcribe texts in an older spelling into the modern variant o...
متن کاملComparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملA System Architecture for Parallel Corpus-based Grammar Learning
This paper describes an architecture for exploiting implicit information about the grammar of the languages included in a parallel corpus. By initially applying statistical word alignment and defining an appropriate representation format for cross-linguistic structural correspondence, this implicit information can feed a system for bootstrapping grammars. The proposed architecture will be under...
متن کاملOn scalable parallel recursive backtracking
Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of these computational resources remains a challenge for several application domains. Many parallel algorithms can scale to only hundreds of cores. The limiting...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
سال: 2017
ISSN: 1347-7986,1881-7203
DOI: 10.3156/jsoft.29.1_527